160        Bioinformatics

-operation g \

-nastring . \

-vcfinpu

The all-variant annotations of SARS-CoV-2 are shown as in Figure 4.22.

4.5  SUMMARY

The high-throughput sequencing makes variant discovery much easier than the use of

traditional methods like microarrays. Raw data obtained from sequencing technology is

used for the detection of variants including base substitutions, insertions, deletions, and

structural variants. Variants can be in any region of the genome; however, only variants

that affect functions of the genes are studied. The consequences of variants depend on the

affected regions and they may be deleterious implicating in healthy conditions and disease

like cancers or may lead to the appearance of a new strain in bacteria and viruses that

is more infectious and lethal like the recent variants of SARS-CoV2 or more antibiotic-

resistant strain of bacteria. This why the variant discovery using sequencing data gained

importance and it is widely used in genetics, medical diagnosis, and drug discovery.

Sequencing depth, paired-end sequencing, and the use of long reads make variant

detection more accurate and allow detection of large-scale variants like structural vari-

ants, insertions, and deletions.

The variant calling pipelines use SAM/BAM files of whole genome, whole transcrip-

tome, or targeted gene sequences to discover the bases in the samples that are different

from the bases on the same locations on the reference genome. Variant calling programs

use two approaches for variant calling. The first approach is used by bcftools and it is based

on consensus sequence which is formed by collapsing the piled-up aligned reads. The sec-

ond approach is used by the recent variant callers like GATK. This approach is based on

haplotypes of the variants that are more likely to be inherited together. GATK 4 is the most

commonly used program for variant calling. It uses an advanced workflow pipeline called

GATK best practice pipeline which leads to the detection of accurate variants.

After variant identification with a variant calling program and filtering, variants can

be annotated by assigning functional information to variants using annotation programs.

FIGURE 4.22  All-variant annotation file of SARS-COV-2.